跳到主要内容

2025-05-13-20-00

Incomplete In-context Learning

Abstract

arXiv:2505.07251v1 Announce Type: cross Abstract: Large vision language models (LVLMs) achieve remarkable performance through Vision In-context Learning (VICL), a process that depends significantly on demonstrations retrieved from an extensive collection of annotated examples (retrieval database). Existing studies often assume that the retrieval database contains annotated examples for all labels. However, in real-world scenarios, delays in database updates or incomplete data annotation may result in the retrieval database containing labeled samples for only a subset of classes. We refer to this phenomenon as an \textbf{incomplete retrieval database} and define the in-context learning under this condition as \textbf{Incomplete In-context Learning (IICL)}. To address this challenge, we propose \textbf{Iterative Judgments and Integrated Prediction (IJIP)}, a two-stage framework designed to mitigate the limitations of IICL. The Iterative Judgments Stage reformulates an (\boldsymbol{m})-class classification problem into a series of (\boldsymbol{m}) binary classification tasks, effectively converting the IICL setting into a standard VICL scenario. The Integrated Prediction Stage further refines the classification process by leveraging both the input image and the predictions from the Iterative Judgments Stage to enhance overall classification accuracy. IJIP demonstrates considerable performance across two LVLMs and two datasets under three distinct conditions of label incompleteness, achieving the highest accuracy of 93.9%. Notably, even in scenarios where labels are fully available, IJIP still achieves the best performance of all six baselines. Furthermore, IJIP can be directly applied to \textbf{Prompt Learning} and is adaptable to the \textbf{text domain}.

摘要

大规模视觉语言模型(LVLMs)通过视觉上下文学习(VICL)展现出卓越性能,该过程高度依赖于从大量标注样本库(检索数据库)中获取的示例。现有研究通常假设检索数据库包含所有类别的标注样本。然而在实际场景中,数据库更新延迟或数据标注不完整可能导致检索数据库仅涵盖部分类别的标注样本。我们将此现象称为不完整检索数据库,并将该条件下的上下文学习定义为不完整上下文学习(IICL)。为应对这一挑战,本文提出迭代判断与集成预测(IJIP)——一个两阶段框架以缓解IICL的局限性。迭代判断阶段将oldsymbol{m}类分类问题重构为oldsymbol{m}个二元分类任务,从而将IICL场景转化为标准VICL问题。集成预测阶段则通过联合利用输入图像和迭代判断阶段的预测结果,进一步提升整体分类准确率。IJIP在两种LVLMs和两个数据集上的三类标签不完整条件下均表现出优异性能,最高准确率达93.9%。值得注意的是,即使在标签完全可用的场景中,IJIP仍优于所有六种基线方法。此外,IJIP可直接应用于提示学习,并能适配文本领域